To tune or not to tune the number of trees in random forest?

نویسندگان

  • Philipp Probst
  • Anne-Laure Boulesteix
چکیده

The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user. It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better. While the principle underlying bagging is that"more trees are better", in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees. The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared error (for regression); (iii) illustrating the extent of the problem through an application to a large number (n = 306) of datasets from the public database OpenML; (iv) finally arguing in favor of setting it to a computationally feasible large number, depending on convergence properties of the desired performance measure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Function of music in illustrating verses related to The Day of Judgment in the Holy Quran

Tune of words and music of sentences in the Holy Quran are not the same and they change in accordance with their meaning and intentions. Type of letters, combination of words in the verses and their increase and decrease have an important effect in tune of the verses. This musical reflection will affect readers‘thought and emotion and also sometimes it illustrates a live and dynamic image...

متن کامل

Residual trees injury assessment after selective cutting in broadleaf forest in Shafaroud

In the Shafaroud forest, logging operation is generally performed by using selective cutting methods. Chainsaw and cable skidder are two main forest machines for harvesting of this forest. However, forest harvesting operations result in serious residual stand damage during felling, winching and skidding operations in this forest. Residual stand damage resulting from selective cutting was asses...

متن کامل

A Study on the Accuracy and Precision of Estimation of the Number, Basal Area and Standing Trees Volume per Hectare Using of some Sampling Methods in Forests of NavAsalem

   The present study aimed to investigate the accuracy and precision estimation of the number, basal area and volume of the standing trees by methods of random and systematic random sampling in the forests of West Guilan. The cost or inventory time was determined using the criteria (E%2 × T). Inventory was carried out by complete sampling (census) in an area of 52 hectares. The study area (sect...

متن کامل

Estimation of Density using Plotless Density Estimator Criteria in Arasbaran Forest

    Sampling methods have a theoretical basis and should be operational in different forests; therefore selecting an appropriate sampling method is effective for accurate estimation of forest characteristics. The purpose of this study was to estimate the stand density (number per hectare) in Arasbaran forest using a variety of the plotless density estimators of the nearest neighbors sampling me...

متن کامل

Evaluation Accuracy of Nearest Neighbor Sampling Method in Zagross Forests

Collection of appropriate qualitative and quantitative data is necessary for proper management and planning. Used the suitable inventory methods is necessary and accuracy of sampling methods dependent the inventory net and number of sample point. Nearest neighbor sampling method is a one of distance methods and calculated by three equations (Byth and Riple, 1980; Cotam and Curtis, 1956 and Cota...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1705.05654  شماره 

صفحات  -

تاریخ انتشار 2017